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Abstract 

We consider a multi-object detection problem over a sensor network (SNET) with limited range sensors. This 
problem complements the widely considered decentralized detection problem where all sensors observe the same 
object. While the necessity for global collaboration is clear in the decentralized detection problem, the benefits of 
collaboration with limited range sensors is unclear and has not been widely explored. In this paper we develop a 
distributed detection approach based on recent development of the false discovery rate (FDR). We first extend the 
FDR procedure and develop a transformation that exploits complete or partial knowledge of either the observed 
distributions at each sensor or the ensemble (mixture) distribution across all sensors. We then show that this 
transformation applies to multi-dimensional observations, thus extending FDR to multi-dimensional settings. We 
also extend FDR theory to cases where distributions under both null and positive hypotheses are uncertain. We 
then propose a robust distributed algorithm to perform detection. We further demonstrate scalability to large 

•i-H 

^ SNETs by showing that the upper bound on the communication complexity scales linearly with the number of 

sensors that are in the vicinity of objects and is independent of the total number of sensors. Finally, we deal 
with situations where the sensing model may be uncertain and establish robustness of our techniques to such 
uncertainties. 



1 Introduction 

The design and deployment of sensor networks (SNET) for distributed decision making pose fundamental challenges 

due to energy constraints and environmental uncertainties. While power and energy constraints limit collaboration 
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among sensors nodes, some form of collaboration is necessary to overcome uncertainty and meet reliability re- 
quirements of the decision making process. The general question of dealing with distributed data in the context of 
detection has been an active topic of research (see [6, 17, 21-24] and references therein). 

In this paper we focus on the problem of distributed detection of localized events, sources or abnormalities, 
and seek to devise a distributed detection strategy that satisfies false alarm and communication cost constraints. 
The problem of localized detection arises naturally in many setups, e.g. whenever there are multiple objects in a 
surveillance area and the sensing range of each sensor is significantly small relative to the surveillance area. For 
example, a number of objects generate a spatially confined signal field and the sensors sample the field at their 
locations as illustrated in Figure[T](b). Preliminary work along these lines have been presented in some of our earlier 
papers [8, 9, 25]. We note here that this work is the first step toward identifying the set of sensors that are proximal 
to a given object. Another interesting subject of research is how to fuse the information from proximal sensors 
to determine precise object location. This latter objective can possibly be accomplished through decentralized or 
distributed fusion techniques. However, we do not pursue this objective here. 

The problem under consideration complements others wherein noisy information about a single event is mea- 
sured by the entire network (global information problems). For global information problems, researchers have 
investigated several architectures ranging from fusion centric to ad-hoc consensus based approaches [2, 3, 6, 12, 14, 
16, 17,21-24]. Although we do not discuss this problem in our paper, many distributed inferencing problems can be 
sub-divided in to two problems: sensor selection, to select sensors in the vicinity of a target, followed by decentral- 
ized/distributed processing of information among the selected sensors. Our paper is related to the former problem of 
sensor selection and is described in [9]. 



Figure 1: Decentralized Detection vs. Localized Detection: In decentralized detection the sensors observe a single global 
phenomenon, whereas in localized detection the sensors observe multiple local phenomena. 

The problem of distributed detection of localized phenomena is closely related to the multiple hypotheses testing 
problems considered in statistical literature [13]. In multiple hypotheses testing problems a set of observations is 
given with each observation coming from one of two distributions and the objective is to associate each observation 




(a) Global information 



(b) Limited sensing range 
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with its correct distribution. This is different from binary hypothesis testing problems with multiple observations, 
where all the observations come from the same hypothesis. For instance, in Figure [T] (a), the observations of all the 
sensors are generated by a single hypothesis, i.e. presence or absence of global phenomenon, and the hypotheses 
set consists of two hypotheses. On the other hand, in Figure [T](b), the observation of each sensor is generated by its 
own set of hypotheses, and the hypotheses set for the whole network can be as large as 2 m hypotheses, where m is 
the number of sensors. 

Although false alarm probability is commonly controlled as the reliability criterion in classical hypothesis testing 
problems, in multiple hypotheses testing problems it invariably results in poor detection performance [1, 4, 5]. In 
order to compensate for poor detection performance, probability of false alarm can be controlled in a test-wise 
manner, a method known as uncorrected testing. The uncorrected testing can be thought of as optimizing a Bayes 
risk criterion for some object density (sparsity level). Here the risk can be the number of errors. Whenever the 
actual object density differs from the implicitly assumed density, the error rates degrade significantly. Therefore we 
consider a recently introduced reliability criterion: the BH procedure for controlling false discovery rate (FDR) [4, 
11, 19]. Briefly, FDR is the expected ratio of number of false positives to total number of declared positives. This 
relaxation and the associated BH procedure has been shown to adapt to unknown levels of sparsity [1]. The best rule 
to reduce the errors is the Bayes risk optimal policy that is tuned to the correct object density and the BH procedure 
tracks the performance of Bayes Oracle risk policy under assumptions of monotonicity of the distribution under 
significant hypothesis. As seen from Figure |2| the BH procedure tracks the performance of Bayes oracle risk policy 
utilizing either the object density or the distribution under positive hypothesis. 
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Figure 2: FDR adapts to unknown sparsity levels: 100 samples with distributions iV(0, 1) under Hq and iV(0, 3) under Hi. 
Average errors were found using Monte-Carlo simulations. 

Nevertheless, the BH procedure suffers from many drawbacks in the context of SNETs. First, the distribution un- 
der significant hypothesis does not satisfy monotonicity conditions. Moreover interpreting such single-dimensional 
conditions in multi-dimensional settings is unclear. On the other hand, in most SNET scenarios the observed dis- 
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tributions are at least partially known and the BH procedure does not exploit this knowledge. We first develop 
a transformation that exploits this knowledge to satisfy these monotonicity conditions. This transformation while 
controlling the FDR at the same level as the BH procedure dramatically improves detection performance. The trans- 
formation is then shown to apply to multi-dimensional observations, thus providing a natural extension to the existing 
single dimensional procedure. This is particularly useful in situations where the sensed information includes object 
features in a multi-dimensional space. Next we present a distributed algorithm for SNETs whose communication 
cost, in terms of the broadcast messages, scales with the number of significant sensors in the SNET, and not the 
total number of sensors. A very interesting implication of this work is that corresponding to an FDR threshold the 
communication cost grows in proportion to the actual number of events, sources or abnormalities while achieving 
the same centralized performance. In many situations we have: (a) partial knowledge of the distributions under 
significant hypothesis; (b) estimates for the mixture distribution of the sensor observations; (c) computational errors 
introduced particularly in multi-dimensional settings. To address these situations we develop a robust extension to 
our procedure. Robustness in an important attribute in SNETs. The object intensity distribution generally follows a 
power law. Therefore, the signal measured at the sensor is the superposition of signals from all the unknown objects. 

The organization of the paper is as follows: in Section [2] we present an overview of the setup. We then describe 
ideal and non-ideal sensing models, describe the false discovery rate, and present the general formulation of the 
problem. In Section [3] we describe the BH procedure, which controls the false discovery rate, and explain its 
suboptimal nature. We then develop the domain transformed BH (DTBH) scheme and show that it outperforms 
BH procedure in terms of detection power given the same FDR constraint. We further present the solution to the 
problem with ideal sensing model via a distributed DTBH algorithm and present the scalability property. In Section|4] 
we perform the robustness analysis of the DTBH algorithm. We show that under certain conditions DTBH procedure 
controls false discovery rate to within a factor of e when the ideal sensing model is relaxed. In Section [5] we present 
simulations and discuss some interesting results. 

2 Setup 

We consider a non-Bayesian setting where an unknown number of objects are distributed on a sensor field of m 
sensors. We assume no prior information on the number of objects, and their potential locations. Objects are 
observed by a SNET in which the sensor nodes are distributed uniformly. We wish to identify, via distributed 
strategies, the set of sensors that have an object in their sensing range. 

To simplify details pertaining to communication complexity we assume a broadcast model whereby a message 
from a sensor is broadcast to the entire network. The communication complexity is the aggregate number of mes- 
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sages broadcast by the sensors in the SNET until algorithm termination. 

We consider an object centric scheme in which the objects generate a signal field over the sensor network and 
the sensors sample the field at their locations. In this scheme the positive hypothesis (Hi) for a sensor is the event 
that the sensor is within the effective region of an object, and the null hypothesis (Hq) is the event that the sensor is 
outside the effective region of all objects. 

The observation vector is denoted by Y = (Y s : s E S), where throughout the paper S represents the set 
of sensors that form the SNET, and Y s represents the collection of measurements taken by sensor s E S. The 
realization of observation vector Y is denoted by y = (y s : s E S). For definiteness we focus on the case when Y 
has a continuous distribution. The cummulative probability distribution (resp. density) function of the observation 
vector Y s at sensor s under each hypothesis H s = Hi, i = 0, 1 is denoted by Gi S (-) (resp. gi S (-)), where H s 
denotes the hypothesis at sensor s. Note that both the CDF and PDF can be suitably described for multidimensional 
observations. We assume a general structure on the problem in the sense that G\ s belongs to a class of distributions 
Gi, and Go s belongs to a class of distributions Go- With a slight abuse of notation, we will use these families for 
distributions and densities where it will not cause confusion. Let So = {s E S : H s = Hq} with cardinality mo 
and Si = {s E S : H s = Hi} with cardinality mi. Here both mo and mi are unknown and the object locations are 
assumed to be arbitrary, i.e. not necessarily uniformly distributed. 

2.1 Mathematical Modeling 

We describe mathematical models for ideal and non-ideal sensing. The ideal sensing model accounts for situations 
where objects can be sensed only if a sensor is within a fixed range of object. 

Ideal Sensing Model: When we mention ideal sensing model we mean that each object has a fixed range in which 
it generates a uniform signal and outside this range the object has no signal. Each sensor samples the field at its 
location, of course with some measurement noise: 



H s = H : y s - g 0s 



H s = Hi : y s - g la 



For example in a linear model, one may consider 



H s = 



H : y s = n s 



H s = 



Hi: y s = t + v t 



s 
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where n s and v s are noise variables with known distributions, and 9 t is the uniform signal of object t in the vicinity 
of which sensor s is present. Note that the class of distributions Go and Gi are singletons in the ideal case. In this 
case the following factorization holds: 

Pr{y s | {H s } seS } = Pr{y s \ H s } 

Non-Ideal Sensing Model: Here we assume that the observed signal from each object decays as a function of 
distance between the object and the observing sensor. Therefore the object signal is no longer constant within the 
region around the object, and is no longer zero outside this region. As a consequence, sensors outside the effective 
region also observe a signal from each object. Here the received signal carries uncertainty due to unknown fading 
gains as well as observation noise, thus we assume knowledge of the models to within families that are not signletons: 

H s = H : y s ~ gos G Go 
H s = #1 : y 8 ~ g la G Gi 



An instantiation for this case is 



H s = H : y s = 



t':d(s,t')>d 



(d(s,f) + l) 



+ n s 



H s = Hn y s = 



(d(s,t) + 



I + V 

+ l) a 



t':d(s,t')>d 



(d(s,f) + l) 



+ Vs 



where t) denotes the distance between sensors s and an object t, a denotes the decay exponent, and do > and 
the constant one in the denominator has been added to eliminate singularities. In this model the observations are 
correlated under both hypotheses, and the factorization presented for the ideal sensing model does not hold. Figure|3] 
illustrates these models. 





(a) Ideal sensing model (b) Non-ideal sensing model 

Figure 3: Ideal and non-ideal sensing models. 
To deal with the non-ideal sensing model, we relax the assumption that Go and Gi are singletons. We deal with 
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this case from a robustness perspective, i.e., as a perturbation of the ideal sensing model, in the upcoming parts of 
the paper. 

2.2 Formulation and Objective 

Before we proceed further we present the following table, which describes important variables in our discussion. 
Here m is the number of samples (or sensor nodes) known in advance. The observable random variable R is the 
total number of sensors that decide positive hypothesis, and the unobservable random variable V is the total number 
of sensors falsely decide positive hypothesis. The false alarm and miss probabilities are associated with the random 
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variables V and T respectively in the above table. As we discussed in the introduction, the solutions based on 
false alarm control cannot adapt to various levels of sparsity. In the appendix, we show via information-theoretic 
arguments (by appealing to Fano lower bound) that asymptotically the worst-case error probability can be bounded 
from below by the conditional entropy, obtained by substituting a uniform prior on objects and their locations. 
Consequently, either the miss rate or the false alarm rate is bounded from below by half the conditional entropy. The 
corresponding theorem is stated below: 

Theorem 2.1 Suppose, u(Y m ) is any strategy that maps the sensor observations to object locations. Then, 
lw = minmax (Pr{V > 1 I {H s : s G S}} + Pr{T > 1 I {H s : s G 5}}) > 76 

u H s 

where 

75 = mmPr(V > 1) + Pr(T > 1) > $(H 8 | Y 8 ) - 

where <fr (• | -)isthe conditional entropy and 75 is the Bayes uniform risk obtained by assuming a uniform distribution 
of objects at m locations. It follows that there exists no decision strategy for which both false alarm and miss 
probability can simultaneously be smaller than Q(H S \ Y s )/2. 

Remark: H s is a binary random variable and so its entropy (or conditional entropy) is always smaller than one. 
Nevertheless, depending on measurement noise at each sensor &(H S \ Y s ) could be arbitrarily close to one. 
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In the FDR [4] formulation we control the worst case expected ratio of V/R, i.e. 



FDR = max E{V/R \ {H s } seS } 

{H s } s es 

For simplicity of notation, from here on we will write FDR = E{V/R} and Pr{- | {i7 s } sG s} = Pr{-} whenever it 
is clear from the context. For completeness, we show that FDR can be expanded as follows: 

FDR = E{V/(V + S)} = E{V/R} 

= E{V/R\R > 0}Pr{P > 0} + E{V/R\R = 0}Pr{P = 0} 
= E{V/R\R > 0}Pr{P > 0} 

The last equality follows from the convention that E{V/R\R = 0} = 0. This is intuitively pleasing, because if no 
sensors declare presence of object within their sensing radius then no false discoveries are committed. In this work 
we seek to use the FDR framework to perform detection in the sensor network problem that has been laid our earlier. 
We wish to devise a distributed detection method that controls the FDR at desired levels. The general form of our 
problem is to minimize the expected miss rate subject to false alarm rate and communication cost constraints. 



3 FDR Control and Domain Transformation 

In this section we will take a close look at the BH procedure, expose its weaknesses, develop a domain transformation 
to improve on its shortcomings, and present the solution to the detection problem with ideal sensing model. 
BH procedure: For completeness we first describe the BH procedure which is also illustrated in Figure |4jFirst, the 
so called p-values are computed. The p value of an observation y s is a non-unique transformation that generates 
a uniform distribution under null hypotheses. One such transformation is p s = P(y s ) = 1 — Go(y s ) but other 
transformations that are related to a-level significance regions are possible [13,20]. The p-values are then ordered 
and the largest index i max , such that pi < ^7 is chosen. All the indices smaller than i max are labeled significant. 

Let Yq 3 rsj Go s (resp. Y\ s ~ G\ s ) be the observed random variable under null (resp. positive) hypothesis at 
sensor s. Define Po s = P(Yos), and similarly P\ s = P(Y]_ S ) and let Fq s and F\ s be their corresponding distribution 
functions, i.e. Pos ~ Fos an d Pis ~ Pis- The family Qq is transformed to a new family JF and Q\ is transformed 
to a new family T\ by this transformation. The following theorem and its proof are presented in [4]. We state the 
theorem without proof and refer the reader to [5] for further details. 

Theorem 3.1 For independent test statistics under null hypothesis, and for any configuration of positive hypotheses, 
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(a) Definition of p value (b) Ordered p values and threshold line 

Figure 4: BH procedure 

the BH procedure controls the FDR at level jmo/m, where mo is the number of true null hypothesis and m is the 
number of observations. 

The main idea behind the proof of Theorem |3.1| lies in the simple fact that the observations under null hypotheses 
are independent, and that the Po s ~ C/(0, 1) Vs E S. 

Shortcomings of BH Procedure: First, the BH procedure performs well only when the realizations of P\ are 
monotonic and clustered near zero, which requires a certain structure on the distribution of observations. This issue 
is particularly problematic for multi-dimensional distributions since it is unclear how to reduce the multi-dimensional 
observations to 1 -Dimension. Moreover, simple strategies such as projections to 1-D do not result in monotonicity 
and clustering around zero. Second, the BH procedure does not take into account the knowledge of the probability 
distributions that generate the samples under H\. The focus primarily is to reduce false positives and there is no 
control over the miss rate. Indeed by suitable transformations it maybe possible to realize the clustering around 
zero. Third, the BH procedure does not lend itself easily to decentralized implementation. More specifically, the BH 
procedure is a last crossing procedure wherein the largest p value smaller than its corresponding threshold must be 
found. This requires searching among 7m p- values, which scales with the number of sensors in the network. In this 
section we address both these issues by using exact knowledge of distributions. We devise a first crossing procedure 
that achieves the detection power of last crossing procedure without the communication overhead that scales with 
the number of sensors. In Section]?] we extend this approach to cases where the distributions are known partially. 

An example where clustering around zero is not guaranteed follows: 
Example: Consider two Gaussian random variables with Yo s ~ N(0, 1) and Y\ s ~ iV(0, .01) for s = 1 . . . m, 
and consider the FDR constraint 7 = .05. Assume that we are given m p values calculated via the transformation 
p(y s ) = 1 — Go(y s ). The goal is to select the samples of P\ s from a mixture of samples subject to FDR constraint, 7. 
In this example most of the realizations of P\ s are close to 0.5 rather than 0. Note, however, that the BH procedure 
seeks for P\ s samples that are less than or equal to .05. Therefore it will not declare any sample of P\ s as significant, 
and BH procedure results in zero detections. To overcome this problem, consider the following transformation on 
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the random variables Pq s and P\ s : 



P ls = |1 -2P is \,i = 0,1,5 = l...m (1) 

Since Po s is uniformly distributed in (0,1), a little algebra shows that Po s is also uniformly distributed in (0,1). 
Therefore we know that if we use BH procedure on the new set of p values we can still control the FDR at .05. 
Observe, however, that most of the realizations of P\ s are now close to 0. When the BH procedure is performed on 
this new set of p values, more of the observations coming from Hi will be declared as significant, thus the detection 
power is increased. I 

Such cases where realizations of Pi are away from zero can arise in many situations. Another example is when 
Yq s and Yi s are exponential random variables with parameters Ao s = 2 and Ai s = 1 respectively. In this case, 
using the p value definition p(y s ) = 1 — Go(y s ), the realizations of Pi s are close to 1. A similar strategy to the 
previous example can be employed to resolve the issue again. In fact, a different definition of p value can be used to 
evade this problem all together. However, in more general cases, for example when the null distribution is a mixture 
distribution, finding a suitable p value definition may not be evident or simple. 

More interesting examples arise when we consider multi variate distributions. We give an example of this nature 
in the sequel. Generally speaking, computationally convenient definitions of p values do not generate Po an d Pi 
distributions suitable for BH procedure 's direct application. Therefore we must make use of the knowledge of 
probability distributions that generate samples under positive hypotheses and transform the p values accordingly. 

3.1 Domain Transformed BH Procedure 

We now develop a method to overcome the issues we observed on the BH procedure. The main idea of this section 
is motivated by our example, and is based on the following insight: recall that all that is necessary for BH procedure 
to control FDR is (a) the observations be independent under null hypotheses, (b) p values be distributed (7(0, 1) 
under null hypotheses. Assume that we can find a transformation T such that T(Pi) is concentrated near 0, and 
T(Po) ~ U(0, 1). Then, (1) since T(Pq) ~ (7(0, 1) we are not interfering with conditions (a) and (b) above, hence 
BH procedure will control the FDR with the new p values. (2) T maps samples of Pi to near 0, and generates a new 
data set that is more suitable for BH procedure. We restate (1) as a proposition below and formally examine (2) in 
the upcoming sections. 

Proposition 3.2 Letpi^p^^ . . . , J? m be a set of p values such that P§ ~ (7(0, 1) andT : (0, 1) — ► (0, 1) be a function. 
IfT(Po) ~ (7(0, 1), then BH procedure controls FDR at desired levels when applied to T(pi), T(jp2) • • • ^{Pm), 
and we say that T is measure invariant with respect to Po. 
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Proof: Since the distribution under null hypothesis is preserved to be U(0, 1), following the proof of theorem 3.1 
gives the result stated in the proposition. I 
The proposed transformation is a reorientation of the p domain. It depends on the distribution of observations 
under positive hypotheses, but not the realizations themselves. Its main features are: (a) preserves uniform distribu- 
tion of p values under null hypothesis, which implies that the FDR constraint is not violated; (b) maps an arbitrary 
PDF of p values to a monotonically decreasing one, which leads to improved detection rates (see Figure [5]). We will 
see how to extend this idea to the multi-dimensional setting in the following section. 




Figure 5: Illustration of domain transform: a monotonically decreasing density is obtained. 



3.2 Transformation of p Domain and Multi-dimensional FDR control 

Suppose, without loss of generality assume that the null distribution is Po s ~ t/^(0, 1). Suppose, /i(-) is the 
corresponding distribution under Hi supported in the cube. 

Special Case when fi is nowhere constant: Let p(y) = Pr{p | fi(p) > fi(y)} be the transformation. This 
transformation involves computing volume of level sets under f\. The fact that this transformation satisfies the 
desired objective will be shown as part of the general transformation later in this section. We next describe an 
example to illustrate the utility of the transformation in a multi-variate problem. 

Example: Let Po s ~ t/ 2 (0, 1), and P\ s ~ Pi(0, 1) where Pi(0, 1) is a 2 dimensional circularly symmetric distribu- 
tion centered around (.5, .5) supported on [0, l] 2 as depicted in Figure[6j Under the proposed domain transformation, 
{x : fi(x) > fi(P)} for some P is always a disk, D, centered around (.5, .5), the edge of which goes through P. 
The transformed p value is the area of this disk, i.e. P = f D ldx, D = {x : fi(x) > fi(P)}, This transformation 
is depicted in Figure [7] (a). Now consider a different transformation which computes the area radially outside the 
observed value and maps that area to P, which is depicted in Figure [7] (b). 

Note that in the corresponding 1-D space, our proposed transformation separated the distribution of Pis from 
uniform much better than the radial transformation did. Furthermore the resulting p values are concentrated near 

1 A uniform distribution achieving transformation is as follows: Let g{y\ , 2/2) be a 2-D distribution. Define, p(yi) = g(t, s)dtds 

andp(^2 | yi) = g(s \ yi)ds where g(s \ y±) denotes the conditional distribution 2nd dimension given the first. 
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Figure 6: Non-normalized density of 2 dimensional P\ s 




.5 1 .5 f 

(a) Domain transformation with respect (b) Radial transformation 

toFi(0,l) 



Figure 7: Illustration of transformation from 2 dimensions to single dimension 
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for the proposed transformation as depicted in Figure [8] Another method that has been used in multidimensional 
cases can be found in [7], where FDR is applied separately for each dimension with varying thresholds so that one 
can still preserve global FDR control. It turns out that this transformation is sub-optimal as well and fails when the 
distributions are not separable in any single dimension. 



Figure 8: Non-normalized density of 1 dimensional P\ s 

To establish our results we consider the general 1-D setting for simplicity of exposition. These definitions 
and related results apply to multi-dimensional setting without any changes where the definition of p value satisfies 
Po s U k (0, 1). For simplicity of notation, we will drop the sensor subscript s , adopting Pq for Pq s , Pi for P\ s , 
and similarly for densities and distributions. In fact, the transformation T we define is associated with a sensor and 
it should be denoted T s , however we omit the subscript. Now let /i(-) be the PDF of Pi, which exists since Pi is a 
continuous random variable. Define the transformation, T, as follows: 

1. Lety max = sup x {/i(x)}. Define a^y) = Eu[I {fl ( x) > y} (x)] and (3^{y) = E^[I {fl(x) > y} (x)} for?/ E 
(0, ymax) where \i is the measure of Pi; i.e. i^(A) = f A fi. Intuitively, a^y) captures the length of the set 
{x : fi(x) > y}, and captures the probability of Pi falling in the set {x : fi(x) > y}. 

2. Generate a new measure: /}(0, ot^y)) = Vy E (0, ymax)- If &fi(y) has a jump at y = yo from a to b, 
then set 

A(0j z) = PM-Mt) {z _ a) + PM) 

for z E (a, 6), which corresponds to a conditionally uniform distribution in (a,b). Let /i(-) be the correspond- 
ing density of /}. 

3. Generate the transformed random variable P = T[P] as follows: For P E (0, 1) find Y = /i(P); then find 
the set S = {x : fi(x) = Y} and choose P randomly from S. 

Various elements involved in this definition can be understood through Figure [9] Note that the above general 
definition reduces to a simple expression for nowhere constant f\\P = Eu[I{f 1 ( x )>fi(P)}( x )] = Sa ^ = { x : 
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(a) a M (2/) and fi^ (y) (b) The new measure jl 

Figure 9: Illustration of transformation elements 
fi{x) > fi(P)}. To firmly establish the validity of these definitions we need to ensure that (3^ can be regarded as 
a distribution function and that jl is absolutely continuous with respect to the Lebesgue measure, which we do so in 
the appendix. 

In summary the Domain Transformed BH (DTBH) procedure involves: (a) Apply the transformation T to real- 
izations of the random variables Po and P\. (b) Follow the BH procedure 

We must now establish two facts pertaining to T. First we must show that T is measure invariant with respect to 
the distribution of Po, i.e. T[Pq] ~ £7(0, 1). Recall that we need this condition for DTBH procedure to control FDR 
at desired levels. Next we need to show that realizations of T[Pi] are indeed clustered near 0. We establish the the 
former via the following two results: 

Proposition 3.3 T is a measure invariant transformation with respect to t/(0, 1). 

Proof: By definition T maps countable sets to singletons. Furthermore sets of non-zero Lebesque measure are 
mapped to sets of same Lebesque measure, hence the uniform distribution is preserved. I 

Proposition 3.4 The DTBH procedure controls FDR at the same level as BH procedure. 



Proof: By Proposition |3 3] T [Pp] ~ Z7 (0, 1). Then the result follows from Proposition 3.2 



Now that we have shown DTBH controls FDR at desired levels, we are left to show that realizations of T[P{[ 
are clustered near 0. Formally, we show that T[Pi] has a monotonically decreasing density. This result will be of 
great importance in proving improved performance of DTBH procedure over BH procedure. 

Proposition 3.5 T converts an arbitrary continuous density of Pi to a monotonically decreasing density over (0, 1); 
i.e. fi{p) is monotonically decreasing in p. 
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Proof: See appendix. 



3.3 Performance Comparisons 

Here we define threshold strategies, and show that with the domain transformation the optimal detection rule is a 
threshold strategy. We also show that DTBH procedure has stronger detection power in comparison to BH procedure. 

Definition 3.6 Assume a partitioning problem of a set of observations X — {xi, #2, • • • ? x m} into two subsets, 
X\ and X2 such that X\ D X2 — (f) and X\ U X2 — X. A threshold strategy is one that computes a threshold 
t{x\, X2, . . . , x m ), and partitions X into two sets: X\ — {x E X : x < t(xi, X2, . . . , x m )} and X2 = {x E X : 
x > t(xi,x 2 , • • -,x m )}. 

We have the following theorem when the object locations are all equally likely on the sensor network but never- 
theless the parameter governing the likelihood of an object at a location is unknown. 

Theorem 3.7 Let all object locations be equally likely on the sensor network, i.e. Pr{H l = Ho} = Pr{H^ = 
Ho}, Vi, j : 1, . . . , rn, and let fo(-) — U(0, 1). If is a monotonically decreasing PDF, a thresholding strategy 
is optimal. 

Proof: See appendix. I 
Before proceeding any further, the term stochastically larger [18] must be introduced: We say that the random 
variable X is stochastically larger than the random variable Y, denoted X > st Y, when Fx (a) < Fy(a) for all a. 

Lemma 3.8 Let X\..X n E (0, 1) be n independent random variables with common density function fx and let 
Y\.Y n E (0, 1) be n independent random variables with common density function fy. Also, let X^ and Y^) denote 
the i th smallest of X\..X n andY\.Y n respectively. If Fx{t) > Fy(t) Vt E (0, 1), then Y^\ > st X^y 

Proof: See appendix. I 
The important implication of this lemma is captured in the following theorem. 

Theorem 3.9 For any given set ofp values with known distributions and any integer k, the probability of declaring 
the first k p values as significant is larger under the DTBH procedure than the BH procedure. 

Proof: Let Pi = T[Pi\. By construction of the transformation, the density of Pi, /1, dominates the density of Pi, 



fi. In other words, Pi < st Pi. Therefore, the results of the lemma [^8] apply to random variables Pi and Pi. 

First, assume that the observations contain only samples from Hq. Since the random variable T[Pq] is stochasti- 
cally equivalent to Pq, the probability of declaring k of them significant is equal for all k with both procedures. Let 
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p r be the k th p value. Next, when the samples of Hi are added one by one, probability of the index of p' increasing to 
k + 1 is larger with addition of samples from T[P\] in comparison to addition of samples from P\. This is because P\ 
is stochastically larger than T[P\]. But, since p r < kj/m implies p' < (k + 1)7/771 the DTBH procedure increases 
the probability of a p value being declared as significant. Furthermore, this argument is valid for all k < m, since T 
converts an arbitrary continuous density of Pi to a monotonically decreasing one, which concludes the proof. I 
Figure[TO|demonstrates the detection power of DTBH procedure in comparison to that of the BH procedure. The 
former is uniformly stronger than the latter. 



_g -pdf of P 
-T[g,] 



(a) Original and transformed PDF of Pi (not 
normalized) 



-BH 
■■DTBH 



0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 

(b) Detection rate vs FDR level 7 



Figure 10: Comparison of detection performance for BH and DTBH procedures. 
Next we describe an algorithmic solution to the distributed detection problem with ideal sensing model case. 
The distributed solution is based on the DTBH procedure, and can be seen as a distributed implementation of DTBH 
with communication constraints. 



3.4 Distributed Detection with Ideal Sensing Model 

The DTBH procedure consists of two main parts. The first part is the domain transformation, which does not require 
any communication between sensor nodes. This is because the distribution of the random variable Pi s is available at 
sensor s E 5, and the domain transformation depends only on this information. Therefore, it can be applied at each 
sensor node locally. The second part of the DTBH procedure is the BH procedure itself, which requires ordering of 
p values. Since ordering of p values is costly in terms of communications, we use a sequential method to accomplish 
the linearly increasing thresholding of BH procedure. See [15] on how single bit information can be transmitted 
efficiently to implement this sequential method. As discussed earlier we consider communication complexity by the 
number of broadcast messages. 
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Distributed DTBH Algorithm: At iteration t each sensor keeps a threshold variable I (it) = vy/m and a bit 
counter count t . Initialize i\ = 1 and county — 0. Then: 

0. Each sensor performs domain transformation 

1. Sensor j decides Hij if pj < I (it) and Hoj otherwise. If decided Hij, announces to the network if it has not 
done so at iterations 1 . . . t — 1 

2. Assume R t sensors decide H\ and declare to the network. Set i t +i = it + 1 & count t = count t -\ + Rt 

3. If countt > it mark iteration t max 

4. If z t = m or R t = label sensors that declare Hi until iteration t max as observing an object and quit 
algorithm, else go to step 1. 

The distributed algorithm described above leads to the same decision rule as the centralized BH procedure. 
However when there is a communication constraint of a bits for the SNET, we only need to put a cap on the count 
variable and perform the distributed BH algorithm while countt < ol. 

The aforementioned distributed algorithm is based on linear increase in threshold at each step. This leads to 
FDR control at level 77710 /m. This leads to inherent conservatism in cases where the number of objects is a finite 
non-zero fraction. We have analyzed an alternative strategy in [10]. Therein, at each count update step an estimate 
of actual number of targets, mi, is computed based on the number of sensors declared as significant. The threshold 
I (it) is then adjusted based on the estimated target density. Our simulation results indicated that this strategy leads 
to a much better detection power, as seen in Figure [TT] 



Figure 11: Constraint update through learning: The estimate of mo allows for update of threshold 7 for better 
detection rates. 
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Scaling Property: We now establish the simple but very important scaling property of the distributed DTBH 
procedure. It is due to this property that we can limit the communication budget with an upper bound that depends 
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on the number of sensors that have an object within their sensing range, and not the total number of sensors in the 
SNET. 

Theorem 3.10 Let m\ — m — mo be the number of sensors with positive hypothesis. The expected ratio of mi to 
the number of sensors that are declared to be significant (R) is lower bounded by 1 — 7; i.e. E{mi/R} > 1 — 7. 

Proof: We know that the BH procedure guarantees E{V/R} = E{V/ (V + S)} < 7. Evidently S < mi, meaning 
the number of correct detections cannot exceed the number of sensors with positive hypotheses. Then: 

E{V/(V + S)} = 1 - E{S/(V + S)} E{S/(V + S)} > 1 - 7 

1 - 7 < E{S/(V + S)} < E{mi/(V + S)} = E{mi/R} 
which concludes the proof of this theorem. I 

3.4.1 Distributed DTBH achieves Performance of Last Crossing Procedure 

Note that the distributed algorithm is a first-crossing procedure. Once the threshold line crosses below the ordered 
p values, the algorithm terminates. The issue is that to maintain FDR control it is only required to terminate at the 
last crossing. Therefore, there could be degradation in performance if one terminates at first crossing. When there 
is an asymptotically large number of observations, the ordered p values form a convex function. In that case, the 
first-crossing and the last-crossing procedures are the same, and they terminate at the same point. However, when 
there is not enough samples, the ordered p values do not form a convex function, and therefore the first-crossing and 
the last-crossing procedures have different termination points. Below we show that the above procedure achieves the 
last-crossing performance with high probability. 

Let the largest p value that is below its corresponding threshold be p' . For this part, let #0 = mo/m, 61 = 
mi/m, be the ratio of observations from each hypothesis. We use the definition of = ry/ra, i = 1 ... m as the 
threshold line for the centralized BH procedure. We assume that the PDF of Pi is monotonically decreasing due to 
the domain transformation. 

Lemma 3.11 Let pk be the k th smallest p value. If E(prj^^) < then Pr{p^ > decays exponentially fast 
with k. 

Proof: Let = #{j : pj < 1^} = Y^JLi ^{Pj<h}' By the switching lemma (see for example [1]) the following 
relationship holds for any k: {E(prj^,) < l k ] & {E(N k ) > [j^]}. Therefore, £(prxi) < h E(N k ) > 
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^-andk<E(N k )(l-e). 



?r{p k >l k } = Pr{N k <k} < Pr{N k <E(N k )(l-e)} 

e 2 E(N k ) ^ 

< exp{ } (2) 

e 2 k 

< exp{ -} (3) 

P1 2(1 -e) J W 

Inequality [2] follows from the Chernoff bound, and inequality [3] follows from the application of switching relation 
along with the assumption of the theorem. I 
The implication of this lemma is that after a certain number of p values, say k, are tested against their corre- 
sponding thresholds, one can decide whether or not to continue the distributed algorithm with an exponentially small 
probability of error. 

This result further suggests presetting k tests at the beginning of the algorithm, which must be performed regard- 
less of the outcome. Note, however, that k can be fixed a priori and does not depend on the size of the SNET. We 
next show that such a modification does not affect important properties of our distributed algorithm. 

Theorem 3.12 Consider the distributed detection algorithm with k preset tests. For that distributed implementation: 

a) FDR < 7 and 

b) The expected number of bits required to detect the objects is upper bounded by maxjfc, E(f^)} 

Proof: a) We show this part by showing that the distributed algorithm is in fact equivalent to the centralized algo- 
rithm, and that presetting k tests affects only the communication cost. If there exists a pi < ry/ra, i > k, then the 
effect of k preset tests is washed out. This is because the centralized algorithm would also declare all p values less 
than pi significant. Therefore FDR < 7 in this case. If there is no such p^, i > k, then the algorithm chooses the 
largest p value pj < j^/m, j < k, and declares all smaller p values significant. Therefore the distributed algorithm 
is equivalent to the centralized FDR procedure, hence FDR < 7. 



b) Without the k preset tests the upper bound was given by Theorem 3.10 and the result is immediate from there 



4 Robustness of DTBH and Non-Ideal Sensing Model 

The main results of this section are directed toward control of FDR via DTBH procedure when the distribution of 
observations under null hypotheses are not known exactly. The reasoning for developing robustness in this perspec- 
tive is as follows: in the non-ideal sensing model the sensors that are outside the effective region receive a small 
residual signal from the objects. Since the received signal is not known, the exact distribution of observations under 
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null hypotheses are not available to calculate the p values. Furthermore, in multidimensional settings it may not 
be possible to obtain the level sets exactly, and this introduces an uncertainty in the distributions that are used to 
perform the domain transformation. This leads to a deviation of Po distribution from Z7 (0, 1). Our goal here is to 
quantify how FDR is affected when DTBH is used with non-ideal sensing model. 

For this section we again assume that an appropriate definition of p values has been chosen, and that we are 
working in the p space as opposed to the original observation space. Further, we assume that a domain transformation 
T has been chosen that preserves uniform distribution. Then we establish three main results. First, we show that 
FDR scales gracefully when Pq distribution deviates from 17(0, 1) by e under a suitable metric. Next, we show that 
the domain transformation preserves the distance e between the distribution of Po and [7(0, 1). These results allow 
us to identify one topology in which our developments can address the distributed detection problem with non-ideal 
sensing model. We then show how the non-ideal sensing model can be addressed with the proposed method. 

For continuous families T§ such that T§ = {Fq : \Fq(x) — x\ < ex}, we have an immediate non-asymptotic 
robustness result, which states that the FDR scales gracefully when BH procedure is used for detection. 

Lemma 4.1 Let Po have continuous distribution Fq(x). If \Fq(x) — x\ < ex, the BH procedure bounds the false 
discovery rate by 7(1 + e), i.e. FDR < 7(1 + e). 

Proof: Define 7^ = 27/771. Let Poi be the mo p values. Denote with Ci(k) the event that if pi is declared Hi, exactly 
k — 1 other p values are declared Hi. Then; 

E(V/R) = J2 E lMPoi<lk,C i (k)}= J2 E l^{Poi<lk}Pr{Ci(k)} 

?'=1: mo k=l:m i=l:mo k=l:m 

^ E E ^(i+e)ft{c*(fc)} = (i+6) E ^ E Pr ^w} 

i=l:mo k=l:m i=l:mo k=l:m 



(i + O E :L = (i + e)^<(i + e ) 7 

v J ^ m v m v J 

i=l:mo 



The second equality follows because Pq^ is independent of all other p values. 



In order to extend Lemma 4.1 to the DTBH procedure, we need to show that the distance e is preserved when we 



apply the domain transformation. The following lemma states this result. 

Lemma 4.2 Let Po have continuous distribution Fq(x) and Po — T[Pq] have continuous distribution Fq(x). If 
\Fq(x) — x\ < ex, then \Fq(x) — x\ < ex. 

Proof: The result is an immediate consequence of the fact that T is a many to one mapping only over countable sets. 
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Combining the results of Lemma 4. 1 and Lemma 4.2 we state the main robustness property of the DTBH procedure 
without the obvious proof. 

Theorem 4.3 For continuous families T§ such that JF — {Fq : \Fq(x) — x\ < ex}, the DTBH procedure bounds 
the false discovery rate fry 7(1 + e), i.e. FDR < 7(1 + e). 



Before going further we note that Lemma 4.2 can be extended to families of size e in Kolmogorov or Prokhorov 
metrics, and these extensions allow us to consider singular distributions as well as continuous ones. The distributed 
detection algorithm can be modified to accommodate for these more general families of distributions, which leads 



to a variant of Theorem |4.3| We present the proof of the extension of Lemma |4.2| to Kolmogorov metric in the 
appendix and omit the modification of the distributed detection algorithm as well as development of the variant of 
Theorem [43] for brevity. 

Theorem 4.4 Let n be the measure associated with F{x) — x and i^q be the measure associated with Fq. Define F 
and Fq to be the respective distributions after the transformation. Ifd tv {fi, fio} < e then sup x | F(x) — Fq(x) |< e 
where for any measurable space ^) — su Pac^ I M^) ~~ K^-) I- 



Non-Ideal Sensing Model: The robustness result stated in Theorem [43] presents us with an immediate modifica- 
tion to the DTBH algorithm in order to control the false discovery rate. It suggests that if we wish to control FDR 
at level 7, we only need to input the threshold 7' = 7/(1 + e). Then the distributed DTBH algorithm presented 
in Section [3~4] can address the problem with Non-Ideal Sensing Model, however with a performance loss. Here we 
present only the form of this loss, as it depends on distribution specific values. 

Let Pi have concave distribution Fi(x). In [11] it has been shown that asymptotically the decision point of the 
BH procedure is c, where c is the solution to 

a / x 1/7 — run/m 

Fi(x =— — r^ x ( 4 > 
mi/m 

Asymptotically, this yields E(T) 1 = (1 — Fi(c))mi, which would have been the solution in the Ideal Sensing 
Model. Since there is uncertainty in the family JF = {Fo : \Fo(x) — x\ < ex}, we use the new threshold in the 
DTBH procedure: 7' = 7/(1 + e). This threshold will yield a new point d such that d is the solution to 

p x) = l/V - m /m x = jl + e)h-m Q /m x 
mi/m mi/m 

We note that d < c since the slope of the right hand side of equation [5] is larger than that of equation |4| The new 
threshold yields a new miss rate, i.e. E(T)y = (1 — Fi(d))mi. The performance loss is then directly related to c 
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and d via the function F^x): E(T)y - E(T) 7 = (A(c) - AMJmi. 

5 Simulation Results 

Below we present a detection simulation in which we use BH and DTBH procedures. The sensor field for the 
simulations is a grid of size 100x100, where each pixel is assumed to have a sensor, and the sensors observe the 
signal within their pixel. The null hypotheses for a sensor is that it is outside the effective region of all objects, and 
the alternative is that it is inside the effective region of an object. Then, with n s and v s being noise at sensor s for 
null and alternative hypotheses respectively, the observation model at sensor s for the non-ideal case is as follows: 

Ho : X s = £ s + n s 
Hi : X s = S + v s 

In the ideal sensing model, £ s = and S = 0. In the non-ideal sensing model, £ s E [0, 0.1], S G [0 — 0.1,0], 
Here we have absorbed the perturbation terms to £ s , and S . We chose a demonstrative distribution for v s and a 
demonstrative value for 9. We note that similar results are obtained when these values are varied. In cases where is 
smaller, the detection rate of DTBH method remains the same, whereas the detection rate of BH procedure degrades 
significantly. 

The results demonstrate the robustness of DTBH procedure to such non-ideal sensing scenarios. For the sim- 
ulation, the FDR threshold was set to 7 = .15, = 2.8, the effective radius of the object r e ff = 2.5 pixels. 
n s rsj iV(0, 1) and v s ~ iV(0, 0.05). There were 10 objects on the field. The communication constraint a was varied 



and the results are presented for illustrative cases in Figure [12] and Figure [13] for the ideal and non-ideal sensing 
models respectively. 

In the ideal sensing model, for a < 150 implementation of the BH procedure was unable to detect the sen- 
sors with Hi hypotheses, whereas the DTBH procedure was able to do so. As the communication constraint was 
loosened, the performance of DTBH procedure increased accordingly, yet keeping the false alarms at low levels. 
Although the BH procedure also detected some sensors with Hi hypotheses, observe that the BH procedure suffers 
from more false alarms. 

In the non-ideal sensing model, note that the BH procedure fails to detect with any amount of communication 
budget. This is because the ordered p values are always above their corresponding thresholds. However, although 
the exact distribution is not known under positive hypotheses, the domain transformation is performed successfully, 
and this allows for successful detection of significant sensors. 
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6 Appendix 

Proof of Theorem 1 First note that from Lagrangian duality it follows that, > 75. Consequently, we are left 
to establish a bound for the Bayesian problem. Now we observe that the error event, 

E = {u(Y N ) ^{H s :se S}} = {V > 1} U {T > 1} 

Therefore, from Fano's inequality it follows that for any strategy (j>: 

Pr(V > 1) +Pr(T > 1) > Pr(E) > ^{H s : s e S} \ Y N ) - 1 = <S>(H S \ Y a ) - ^ 

The last equality follows from the independence assumptions. I 
Technical Details of Domain Transformation: 

Proposition 6.1 Fi(a^(y)) = /i(0, a^y)) — fl^y) is a distribution function. 

Proof: To show that F\ (•) is a distribution function, we need to show that (1) F\ (•) is monotone increasing, (2) F\ (•) 
is right continuous, (3) lim^-oo = and lim x ^ +00 = 1. 

(1) Fi(a^{y)) is monotone increasing in a M (y). This is because a^(y) increases as y decreases, and as y decreases 
P»(y) = ^[ J {/iW>y}W] increases. 

(2) Next, Fi(a /2 (y)) is a right-continuous function. To show this, consider some c/, and appropriate y 7 and (3' '. 

a^(y) I a' as y | y 7 . But, y | y 7 /^(y) | /? 7 , and since (x) > y} (x)] is right continuous so is F^a^y)) = 

(3) Finally, we show that lim a ^ y ^ Fi(a fX (y)) = and lim^ Fi(a fl (y)) = 1. The reason that we only 
consider and 1 as the limit points is because a^(y) E [0, 1] by definition. 

Note that since fi is the PDF of a continuous random variable, there exists a y 7 such that {x : > y 7 } is 

empty. Therefore, as y j y 7 , a^(y) j OL^{y') = 0, and /3 M (y) | ft^y') = 0. This shows the first part. Next, consider 
the case when y I 0. In this case a M (y) | g^(0) = 1 and /? M (y) | /3 M (0) = 1. This establishes that Fi(a fi (y)) is a 
distribution function. I 

We now show that jl is absolutely continuous with respect to Lebesgue measure, thus admits a PDF. 

Proposition 6.2 jl is absolutely continuous with respect to Lebesgue measure. 

Proof: Over any zero measure set A with respect to Lebesgue measure, a^(y) = and /?^(y) = and therefore 
jl(A) = 0. I 



23 



Proof of Proposition 3.5 Note that /3^(y) is concave as a function of a^(y). To show this consider yi > y2 > 
. . . > y n and a sequence of sets A\ C A2 C . . . C A n such that Ai = {x : fi(x) > yi}. Note that C(A{) < 
£(^-2) < . . . < where C denotes the Lebesgue measure. Since sup xGA . +l _^ fi(x) < inf x eAi fi( x )> 

dft^/da^ is monotonically decreasing in these sets. Therefore, the new measure Fi(a /Jj ) = /}(0, a^) is concave, and 
the proposition follows. I 
Proof of Theorem |3.7| Let fo(-) and /i(-) be PDFs of observations under Hq and ifi respectively. Let V = 
{Pi, P2, . . . , im} be given, where P^ are independent random variables having PDFs fo or fi with unknown prior 
probabilities Pr{iP = Hq} and Pr{iP = H\} respectively. Now consider the partitioning problem as described in 
Definition 13. 61 

Let S be a decision rule that chooses Vs C V and labels Pi. Define P* = max^{P^ E P#} and P* = 
min^{P^ E?J} where V c s = V — Vs- Now define a new decision rule S' as follows: If V s ^ <\> and P* < p < P* 
for some p, then S" chooses Vs f = (P# — {^P*}) U {^*} and labels H\. In all other cases S f chooses Vs' = P s - 
With this setup, the following lemma establishes that strategy S' suffers a smaller FDR than does strategy S. 

Lemma 6.3 Let relevant quantities be defined as above. Assume that Pr{H l — Ho} — Pr{H^ — Po} 5 Vz,j : 
1, . . . ,m. If fo(') — U(0, 1) and fi(-) is a monotonically decreasing PDF, then the false discovery rate of the 
strategy S f is less than or equal to that of the strategy S, i.e., FDRs > FDRs>. 

Proof: Let m = mo + mi and let u E (0, l) m . Let B = {u : P* < P* > p} have a nonzero measure. Outside 
B, S itself is a thresholding strategy and 5 = S f . Therefore we only need to consider the set B. Let i*(uS) and z*(u;) 
be the indices of P* and P* in the set P. Now, consider a P' C B in which i*(o;) = z* and = z* are fixed. 

Define = I (Pi E P 5 ), = I (Pi G P<?')> and = /(if 4 = #0). Then, 

FDRs — FDRs' = E(-f\B')-E(-^\B') 

Rs Rs' 

E ^i=i m Ds(i)A({) E{ Zi=i: m Ds>(i)A(i) , 

Y,i=l:m D s(i) Y<i=l;m D S>(i) 
£f Ei=l:m Ds(i)A(i) Ei = l:m D S >(i)A(i) , 
[ Ei=l:mD S (i) Ei=l^^(0 ' ^ 



Now, note that the cardinality of Pg and P5/ are the same, and Dg{i) 7^ Ds>(i) only for i* and i*. Therefore we can 
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rewrite the above difference as follows: 



FDRs - FDRs' = E( + + gf<!^!j 

Ei = i :m ,^ { i»,i < . } Ds>{i)A{i) - D sl (i*)A(i*) - D s ,(QA(u) 

Ds(i*)A(i*)-D s ,{u)A{u) A (i*)-A(Q 



FDR S ~ FDRs, > " A{Qm ~ = Ho) ~ I[HU = mB ' ] 



m m 
Pr{H e = H \B'} - Pr{iP* = H Q \B'} 

m 

1 Pr{H e = Hp, B'} Pr{jP* = Hp, B'} 
m [ Pr{B'} Pr{B'} J 

1 Pr{B>\H e = H }Pr{H e = Hp} Pr{B'\H^ = H }Pr{H^ = H } 
Pr{P'} Pr{P'} ^ 

1 Pr{P* > p\H e = P }Pr{P, < p}Pr{iP* = H } 
m ^ Pr{P* > p}Pr{P* < p} 

Pr{P, < p\IP* = H }Pr{P* > p}Pr{H i * = Hp} 
Pr{P* > pjPrjP* < p} J 

Here observe that Pr{H l * — Hp} — Pr{H l * — Hp} by the hypothesis of the theorem, and Pr{P* < p} 
1 — Pr{P* > p} due to independence assumption. Therefore, 

FDRs — FDRs' > M^* = ^o} r (l - p)(l - Pr{P* > p}) pPr{P* > p} 1 



m L Pr{P* > p}Pr{P* < p} Pr{P* > p}Pr{P* < p} 
Pr{fP* = tf } 



mPr{P* > p}Pr{P* < p} 
Pr{iP* = H } 



[1 - Pr{P* > p} - p + pPr{P* > p} - pPr{P* > p}} 

[F P *(p)-p] >0 



mPr{P* > p}Pr{P* < p} 
The last inequality follows from the fact that fo = £7(0, 1) and fi is monotonically decreasing. 



Now, to prove the result of Theorem 3/7 it suffices to iterate Lemma 63 whenever the set S' is not a threshold 
set, i.e. the result of a threshold strategy. Specifically, if S f is not a threshold set, redefine S = S f , generate a new 
S", and repeat this procedure until S" is a threshold set. I 
Proof of Lemma [3l8l 

fx w (t) = J-T^^ ^y -\l-Fx{t)Y-*fx{t) 
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F ^(t) = {i _ 1)!(n _ i){ I {Fx{x)r\l - F x {x)) n -*f x {x)dx 



By the same approach it is easy to see that 



y(y)dy 



(i-l)!(n-i)!y 



-u) n - l du 



By hypothesis of the lemma, F x (t) > F Y (t) Vt G (0, 1), and since < iz < 1, it follows that F X(i) (t) > F Y{i) (t) 
Vt E (0, 1), which concludes the proof of the lemma. I 
Proof of Theorem [44] 

Proof: Let A x C (0, 1) be the set that gets mapped to the set (0, x) by the transformation. Since d tv {fi, fio} < e, 
by definition of total variation distance we have | fi(A x ) — ^(Ax) \< e. Noting that ii(A x ) = /i(0, x) and 
fio(A x ) = /io(0, x) we have | /i(0, x) — /io(0, x) \< e and the result follows. I 
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Figure 12: Detection performance of distributed implementations under ideal sensing model for a = 150 bits (b,d,f), 
a = 200 bits (c,e,g). (A purely gray plot indicates that no detection was made.) 
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Figure 13: Detection performance of distributed implementations under non ideal sensing model for a = 150 bits 
(b,d,f), ol = 200 bits (c,e,g). (A purely gray plot indicates that no detection was made.) 
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